منابع مشابه
Optimization of Search Results with De-Duplication of Web Pages In a Mobile Web Crawler
Being in an information era, where search engines are the supreme gateways for access of information on web. The efficiency and reliability of search engines are significantly affected by the presence of large amount of duplicate content present on World Wide Web. Web storage indexes are also affected by the presence of duplicate documents over web which leads to slowing down of serving results...
متن کاملWorld Wide Web Crawler
We describe our ongoing work on world wide web crawling, a scalable web crawler architecture that can use resources distributed world-wide. The architecture allows us to use loosely managed compute nodes (PCs connected to the Internet), and may save network bandwidth significantly. In this poster, we discuss why such architecture is necessary, point out difficulties in designing such architectu...
متن کاملReinforcement-Based Web Crawler
This paper presents a focused web crawler system which automatically creates a minority language corpora. The system uses a database of relevant and irrelevant documents testing the relevance of retrieved web documents. The system requires a starting web document to indicate where the search would begin.
متن کاملWeb Crawler Architecture
Definition A web crawler is a program that, given one or more seed URLs, downloads the web pages associated with these URLs, extracts any hyperlinks contained in them, and recursively continues to download the web pages identified by these hyperlinks. Web crawlers are an important component of web search engines, where they are used to collect the corpus of web pages indexed by the search engin...
متن کاملAn Effective Method for Ranking of Changed Web Pages in Incremental Crawler
The World Wide Web is a global, large repository of text documents, images, multimedia and much other information, referred to as information resources. A large amount of new information is posted on the Web every day. Web Crawler is a program, which fetches information from the World Wide Web in an automated manner. The crawler keeps visiting pages after the collection reaches its target size,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal on Web Service Computing
سال: 2015
ISSN: 2230-7702,0976-9811
DOI: 10.5121/ijwsc.2015.6101